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Abstract 

We provide a framework to bound the probability that accumulated errors were never above a 
given threshold on hybrid systems. Such systems are used for example to model an aircraft or a nu- 
clear power plant on one side and its software on the other side. This report contains simple formulas 
' based on Levy's and Markov's inequalities and it presents a formal theory of random variables with 

£N) , a special focus on producing concrete results. We selected four very common applications that fit in 

our framework and cover the common practices of hybrid systems that evolve for a long time. We 
, compute the number of bits that remain continuously significant in the first two applications with a 

P_h ■ probability of failure around one against a billion, where worst case analysis considers that no signif- 

,— j- , icant bit remains. We are using PVS as such formal tools force explicit statement of all hypotheses 

("s^ ' and prevent incorrect uses of theorems. 

^ ; 1 Introduction 

c/3 ', Formal proof assistants are used in areas where errors can cause loss of life or significant financial 

damage as well as in areas where common misunderstandings can falsify key assumptions. For this 
reason, formal proof assistants have been much used for floating point arithmetic 0] HI [2 01 HI |6) and 
probabilistic or randomized algorithms QO. Previous references link to a few projects using proof 
assistants such as ACL2 @, HOL flU, Coq fll] and PVS lfl2l . 

All the above projects that deal with floating point arithmetic aim at containing worst case behavior. 
Recent work has shown that worst case analysis may be meaningless for systems that evolve for a long 
time as encountered in the industry. A good example is a process that adds numbers in ±2 with a measure 
error of ±2~ 24 . If this process adds 2 25 items, then the accumulated error is ±2, and note that 10 hours 
of flight time at operating frequency of 1 kHz is approximately 2 25 operations. Yet we easily agree 
that provided the individual errors are not correlated, the actual accumulated errors will continuously be 
much smaller than ±2. 

We present in Section |2] a few examples were this work can be applied. We focus on applications 
for n counting in billions and a probability of failure about one against a billion. Should one of 
these constraints be removed or lessened, the problems become much simpler. The main contribution of 
this work is the selection of a few theorems amenable to formal methods in a reasonable time, their 
application to software and systems reliability, and our work with PVS. Section [3] presents the formal 
background on probability with Markov's and Levy's inequality and how to use this theory to assert 
software and system reliability. 

Doob-Kolmogorov's inequality was used in previous work lfl3l . It is an application of Doob's in- 
equality that can be proved with elementary manipulations for second order moment. It is better than 
Levy's inequality in the sense that it can applied to any sum of independent and centered variables. Yet 
it is limited by the fact that it bounds only second order moments. 



2 Applications 

Levy's inequality works with independent symmetric random variables as we safely assumed in Sec- 
tions [2j] and |2T2] Doob's inequality combined with Jensen's one will overcome this restriction in future 
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Listing 1 : Accumulation or dot product 



1 flo = ; 

2 for (z'=l; i<=n; i = i+l) 

3 a,- = a,_ i + di ; 



formal developments for the applications presented in Section [231 that cannot be treated by Levy's in- 
equality. Alas, we foresee that the effort to make Doob's inequality available in any of the formal tools 
available today is at least a couple of years. Automatic treatment of all the following applications may 
use interval arithmetic that has been presented in previous publications and is now available in formal 
tools HEE). 

2.1 Long accumulations and dot products 

A floating point number represents v = m x 2 e where e is the exponent, an integer, and m is the mantissa 
Ifl4l . IEEE 754 standard iPTBTl on floating point arithmetic uses sign-magnitude notation for the mantissa 
and the first bit bo of the mantissa is implicit in most cases (bo = 1) leading to the first definition in 
equation £[]). Some circuits such as the TMS320 [16] use two's complement notation for m leading to 
the second definition in equation dTJ. The sign s and all the bj are either or 1 (bits). 

v = (-l) s xb .bi---b p ^ l x2 e or v = (bo-bi ■ ■ -b p -\ - 2 x s) x 2 e (1) 

In fixed point notation e is a constant provided by the data type and bo cannot be forced to 1. We define for 
any representable number v, the unit in the last place function below, with the notations of equation (Q]). 

ulp(v) = 2 e - p+1 

The example given in Listing Q] sums n values. When the accumulation is performed with floating 
point arithmetic each iteration introduces a new round-off error X,-. One might assume that X follows a 
continuous or discrete uniform distribution on the range ±w with u = ulp(a i )/2 as trailing digits of num- 
bers randomly chosen from a logarithmic distribution ifTTl pp. 254-264] are approximately uniformly 
distributed lfT8l . A significantly different distribution may mean that the round-off error contains more 
than trailing digits. 

Errors created by operators are discrete and they are not necessarily distributed uniformly |fl9ll . The 
distribution is very specific but as soon as we verify that it is symmetric we only have to bound the 
moments involved in our main result as in equation (|2]). 

E(X) = 0, E(X, 2 )<^, E(X*)<j, E(X, 6 )<^, and E(X, 8 )<^ (2) 

If di uses a directed rounding mode, we introduce X/ = X — E(X) and we use equation §2} again. We 
may also assume that di also carries a single error Xi^ or a linear combination of m — 1 round-off errors 
X2 5 ,-, . . . ,X m i - such that all of them satisfy equation (O for a given u. 

If di is a data obtained by an accurate sensor, we may assume that the difference between di and the 
actual value dj follows a normal distribution very close to a uniform distribution on the range ±w with 
some new value of u. In this cases we model the error dj — dj by a symmetric random variable X2 ;i and 
we use equation ©. 
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Table 1 : Number of significant bits with a probability of 



failure P ( max > e ) bounded by P 

\1<i<n / 



u 


n 


m 


p 


2k 


£ ~ or £ ~ 


1o2t £ ~ or 1o2t £ ~ 


2-24 


10 9 


2 


1(T 9 


2 


68.825 


+6.10 










4 


0.42832 


-1.22 










6 


085786 


-3.54 










8 


0.040042 


-4.64 










44 


0.010153 


-6.62 


2-24 


10 9 


10 


10 -io 


2 


486.66 


+8.92 










4 


1.7031 


+0.768 










6 


0.28156 


-1.82 










8 


0.11939 


-3.06 










48 


0.023873 


-5.38 


u 




1 




2 


^4 M - 2 /9 


(-log 2 « + l-log 2 3)/2 










4 




(-log 2 w + 2-21og 2 3)/8 










6 


'^100/81 


(log 2 10-21og 2 3)/6 










8 


'{/4900w/729 


(log 2 u + 2 log 2 70 - 6 log 2 3 ) / 1 6 



After n iterations and assuming that all the errors introduced, Xj,X2j, ■ ■ ■ ,X m j are symmetric and 
independent, we want the probability that the accumulated errors have exceeded some user specified 
bound £: 



max (\Si\) >e)<P with S„ 

i<i<n ' 



Xi+ < 



-E(Xi) 



if centering is needed 
if more variables are needed I . (3) 



Previous work used Doob-Kolmogorov's inequality We will see in Section [3] that we can exhibit 
tighter bounds using Levy's inequality followed by Markov's one. Table [Qpresent the number of signifi- 
cant bits of the results log 2 £ for some values of u, n, m, and P in equation (©. These values are obtained 
by using one single value of u, as large as needed. Tighter results can be obtained by using a specific 
value of u for each random variable and each iteration. 



2.2 Recursive filters operating for a long time 

Recursive filters are commonly used in digital signal processing and appear for example in the programs 
executed by Flight Control Primary Computers (FCPC) of aircraft. Finite impulse response (FIR) filters 
usually involve a few operations and can be treated by worst case error analysis. However infinite impulse 
response (IIR) filters may slowly drift. 

Theory of signal processing provides that it is sufficient to study second order IIR with coefficient b\ 
and &2 such that polynomial X 2 — b\X — hi has no zero in R. Listing |2] presents the pseudo-code of one 
such filter. A real implementation would involve temporary registers. 

When implemented with fixed or floating point operations, each iteration introduces a single error X\ 
or a compound one X\j H \-X m j in y>j. As these filters are linear, we study the response to do = 1 and 
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Listing 


; 2: Infinite impulse response (IIR) filter 


y-i=0; 
for (/ = 
yi = di 


>'o = d ; 
= 1 ; i<= 

-hyi-i - 


n; i = i 

hyi-2 ; 






Listing 3: Sum of squares 


ao = 0; 
for ( i = 

at = di- 


= 1 ; i <= 
- 1 +dj* di 


n ; i = i 

> 







di = otherwise, to deduce the accumulated effect of all the errors on the output of the filter. It is defined 
as the sequence of real numbers such that 

y-\ = 0, y = 1, and y n = -b\y n -\ - b 2 y n -2 for all n > 1. 

This sequence can also be defined by the expression 

y„ = b^ 2 \Jb 2 + 1b\ cos (cOq + n(o) with constants (Oq and ft). 

If the filter is bounded-input bounded-output (BIBO) stable, < b2 < 1 and the accumulated effect of the 
round-off errors is easily bounded by \Jb 2 + 1b\ / (1 — \fbj). Worst case error analysis is not possible 
on BIBO unstable systems. Our work and the example of Table[T]can be applied to such systems. 



2.3 Long sums of squares and Taylor series expansion of programs 

The previous programs introduce only first order effect of the round-off errors. We present here systems 
that involve higher order errors such as sum of square in Listing [3] and power series of all the random 
variables as in equation 01 

Assuming that J, carries an error X, in Listing [3l its contribution to the sum of square cannot be 
assumed to be symmetric. Levy's inequality cannot be applied, but Doob's inequality provides a similar 
result though it is out of reach with current formal tools and libraries. 

The output of a system can always be seen as a function F of its input and its state (do, . . . ,d n ). 
This point of view can be extended by considering that the output of the system is also a function of the 
various round-off errors (Xq, . . . ,X q ) introduced at run-time. Provided this function can be differentiated 
sufficiently, Taylor series expansion provides that 

F(do,...,d n ,X ,...,X q ) = F(do,...,d n ,0,...,0) 

r 1 f" dF\ lm] 

~xm\\p Q dXiJ (4 ) 

(4 dF\ [m+1] 

+ (E^j^J {do,...,d n ,6o,...,6 q ), 
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where 0, is between and D ; , and (-)[ m l is the symbolic power defined as 




dF 

dXi 



) 



[m] 



d m F 



dX h •••dXi 

1 1 l m 



0<h,-,'m<q 



When the Taylor series is stopped after m = 2, we can use Doob's inequality to provide results similar 
to the ones presented in Table Q] provided X,- are symmetric and independent. Higher order Taylor series 
don't necessarily create sub-martingales but weaker results can be obtained by combining inequalities 
on sub-martingales. 

3 Formal background on probability 

3.1 A generic and formal theory of probability 

We rebuilt the previously published theory of probability spaces lTT3l as a theory of Lebesgue's integra- 
tion recently became fully available. The new PVS development in Figure [T] still takes three parameters: 
T, the sample space, 5?, a a-algebra of permitted events, and P, a probability measure, which assigns to 
each permitted event in S?, a probability between and 1. Properties of probability that are independent 
of the particular details of T, 5?, and P are then provided in this file. 

A random variable X is a measurable application from (T, SP) to any other measurable space (T' 
In most theoretical developments of probability T, S", and P remain generic as computations are carried 
on T'. Results on real random variables use T' = R whereas results on random vectors use T' = R". Yet 
both theories refer explicitly to the Borel sets of T"'. 

As the Borel sets of W are difficult to grasp, most authors consider finite T and 5? = &(T) for 
discrete random variables in introductory classes. This simpler analysis is meant only for educational 
purposes and most results of probability considered for formal methods can be implemented with generic 
T, 5?, and P parameters. 

Handling discrete and continuous random variables through different T and 5? parameters is not 
necessary and it is contrary to most uses of probability spaces in mathematics. Such variables can be 
described on the same generic T, , and P parameters in spite of their differences. In practice, we use 
T' = R or T' = R'\ and for discrete variables we can choose countable codomains. 

Similarly, many authors work on sections {X < x} rather than using the inverse images of Borel 
sets of T' because the latter are difficult to visualize. Such a simplification is valid thanks to Dynkin's 
systems. But using abstract Borel sets rather than sections in formal methods often leads to easier proofs. 

3.2 A concrete theory of expectation 

The previous theory of random variables |[T3l made it possible to define them and to use and derive 
their properties. Very few results were enabling users to actually compute concrete results on random 
variables. Most of such results lie on a solid theory of the expected value. As most theorems in the later 
theory are corollaries of a good theory of Lebesgue's integration, we have developed a formal measure 
theory based on Lebesgue's integration and we develop formal theorems on expected values as needed 
in our applications. 

The expected value is the (unique) linear and monotonous operator E on the set of P-integrable 
random variables that satisfies Beppo-Levy's property and such that E(xi) = P(A) for all A G y '. We 
can also use the following definition when Lebesgue's integral exists: 
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probability_space [T : TYPE+ , (IMPORTING topology@subset_algebra_def [T] ) 

S:sigma_ algebra, (IMPORTING probability_measure [T,S] ) 
P :probability_measure] : THEORY 

BEGIN 

IMPORTING topology@sigma_algebra[T,S] , 
probability_measure [T,S] , 
continuous_f unctions_aux [real] , 
measure_theory@measure_space [T , S] , 
measure_theory@measure_props [T , S , to_measure (P) ] 

limit: MACRO [(convergence_sequences . convergent) ->real] 

= convergence_sequences . limit 



h : 


VAR 


borel_f unction 


A,B: 


VAR 


(S) 


x,y: 


VAR 


real 


nOz: 


VAR 


nzreal 


t: 


VAR 


T 


n: 


VAR 


nat 



X,Y: VAR random_variable 

XS: VAR [nat->random_variable] 

null?(A) :bool = P(A) = 

non_null?(A) :bool = NOT null?(A) 

independent? ( A, B) :bool = P(intersection(A,B) ) = P(A) * P(B) 

zero: random_variable = (LAMBDA t: 0) 
one: random_variable = (LAMBDA t: 1) 

<=(X,x):(S) = {t I X(t) <= x}; °/„ Needed for syntax purposes! <> = /=>= omitted 
complement_lel : LEMMA complement(X <= x) = (x < X) % More omitted 

+ (X,x) : random_variable = (LAMBDA t: X(t) + x) ; °/ Needed for syntax purposes! More omitted 

borel_comp_rv_is_rv: JUDGEMENT o(h,X) HAS_TYPE random_variable 

partial_sum_is_random_variable : 

LEMMA random, variable? (LAMBDA t: sigma(0,n, LAMBDA n: XS(n)(t))) 

distribution_function?(F: [real->probability] ) :bool 

= EXISTS X: FORALL x: F(x) = P(X <= x) 
distribution_f unction: TYPE+ = (distribution_f unction?) CONTAINING 

(LAMBDA x: IF x < THEN ELSE 1 ENDIF) 
distribution_f unction(X) (x) probability = P(X <= x) 

convergence_in_distribution? (XS,X) :bool 

= FORALL x: continuous (distribution_function(X) ,x) IMPLIES 

convergence ( (LAMBDA n: distribution_f unction(XS (n) ) (x) ) , 
distribution_f unction(X) (x) ) 

invert_distribution: LEMMA LET F = distribution_f unction(X) IN 

P(x < X) = 1 - F(x) 
interval_distribution: LEMMA LET F = distribution_f unction(X) IN 

x <= y IMPLIES 

P(intersection(x < X, X <= y)) = F(y) - F(x) 
limit_distribution: LEMMA LET F = distribution_f unction(X) IN 

P(X = x) = F(x) - limit (LAMBDA n: F(x-l/(n+l) ) ) 

F: VAR distribution_f unction 

distribution^ : LEMMA convergence(F o (lambda (n:nat): -n),0) 

distribution^ : LEMMA convergence (F, 1) 

distribution_increasing: LEMMA increasing? [real] (F) 

distribution_right_continuous : LEMMA right_continuous (F) 
END probability_space 



Figure 1 : Abbreviated probability space file in PVS 
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Markov's inequality below is heavily used to obtain concrete properties on random variables. 
Theorem 1 (Markov's inequality). For any random variable X and any constant e, 

F( M > E )<»i. 

Many theorems relate to independent random variables and their proof are much easier once inde- 
pendence is well defined. The family (Xf, . . . ,X n ) is independent if and only if, for any family of Borel 
sets (B 1 ,...,B n ), 

r(f] = f[P(X i eB i ). 

The following characteristic property is used a lot on families of independent variables: 
For any family of Borelean functions (hi,... ,h n ) such that the hi(Xj) are P-integrable, 

It is worth noting that the fact that n random variables are independent is not equivalent to the fact 
that any pair of variables is independent and cannot be built recursively from n — 1 independent random 
variables. 

Future work may lead us to implement a theory of the law Fx associated to each random vector 
X : T — > W, with a "transfer" theorem for any Borelean function h : W — ► M below and most properties 
of Lebesgue's integral including Fubini's theorem. 

E(h(X)) = fh(X) dP = / hdF x 

J7 JW 



3.3 Almost certain a priori error bound 

What we are actually interested in is whether a series of calculations might accumulate a sufficiently large 
error to become meaningless. In the language we have developed, we are computing the probability that 
a sequence of n calculations has failed because it has exceeded the e error-bound somewhere. 

Theorem 2 (Corollary of Levy's inequality). Provided the (X n ) are independent and symmetric the 
following property holds for any constant e. 

P( maxflS/l) >£ ) <2P(|5„| > e) 

\l<i<n / 



Proof. We use a proof path similar to the one published in |[20l . We define S n below with Dirichlet's 
operator dp that is equal to 1 if the predicate holds and otherwise. As the X n are symmetric, the random 
variables S n and 5„ share the same probability density function. 

si j) = f j (-l) S ->'X i 
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We now define N = inf{& such that \St\ > s} with the addition that inf = +00 and similarly N^' = 
inf{A: such that > £}. Events maxi< ( <„(|5 ; |) > £ and < n are identical. Furthermore 

P(|S„|>e) = £P(|S„|>£ nAf = 7) = £lP(|^ ) |>£ HiV = ;j. 
;=i ;'=i 

As soon as j < n, 2Sj = 5 n +5„ and 2\Sj\ = \S n \ + Therefore, the event {\Sj\ > e} is included 
in{|S„| >e}U{\S { n j) \ >£}and 

p(N< w ) = £p(|5 ; | >£niv = j)< £p(|5„| >£ n A^ = j') + E p (l ,s »' /) l - e n N = J 

;=i ;=i 7=1 

This ends the proof of Levy's inequality. □ 

Should we need to provide some formula beyond the hypotheses of Levy's inequality, we may have 
to prove Doob's original inequality for martingales and sub-martingales |2D in PVS. It follows a proof 
path very different from Doob-Kolmogorov's inequality but it is not limited to second order moment and 
it can be applied to any sub-martingale Sf with k > 1 to lead to 

E(Sl k ) 



P( max (\Si\) > £ ) < 

\l<i<n J 



£ 2k 



Shall we need to create a sub-martingale different from Sj k , we may have to prove Jensen's con- 
ditional inequality that let us introduce where h : M + — > M + is convex. The bound becomes 

E(h(\S n \))/h(e). 

We use Markov's inequality applied to S* in order to obtain the results of Table [T] 



P(|5„| >e)=¥[\S k n \>e k j <E[\S*\) /e\ 

Formulas 

E(S„ 2 ) = u 2 {\n) 

E(S^) = u A {\n+\n(n-\)) 

E(Sjj) = u 6 {\n + n{n-\) + ln{n-\){n-2)) 

E(S s n ) = u 8 (±n + ^n(n-l) + ^n(n-l)(n-2) + §n(n-l)(n-2)(n-3)) 
are based on the binomial formula for independent symmetric random variables 

E ^ n )~ kl+k2 h +k J 2k) ' (2*0! (2*2)< (2k nV . ■ 
Proof. We first prove the formula below by induction on n for any exponent m. 

E(Xp)E(Xp) E(X„ m ") 



e(s>;;)= £ 



nr. 



mi+«i2H \-m„=m 



It holds for n = 1 . We now write the following identity based on the facts that X„ are independent 
and symmetric. E (S™ +1 ) = E ((S n +X n+ \) m ) is also equal to 

(P m \ \ P ml 
V y%+iom-m„ +] _ y '^2 F /V m "+i W (am-m n+ i\ 
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We expand the terms of the sum for m n+ \ = 0, . . . ,p and Y!l=i m i = m ~ m n+\ 

ml (m-ra.ii)!^^, 



(m-ra„ +1 )!m, 1+1 ! n"=i m ' ! /=i 

We end the proof for the even values of m after noticing that E {xf k+i ) = for any i and any k since 
X n are symmetric. □ 



4 Perspectives and concluding remarks 

To the best of our knowledge this paper presents the first application of Levy's inequality to software 
and system reliability of very long processes with an extremely low rate of failure. Our results allow any 
one to develop safe upper limits on the number of operations that a piece of numeric software should 
be permitted to undertake. In addition, we are finishing certification of our results with PVS. The major 
restriction lies in the fact that the slow process of proof checking has forced us to insist that individual 
errors are symmetric. 

At the time we are submitting this work, the bottleneck is the full certification of more results using 
PVS proof assistant. Yet this step is compulsory to provide full certification to future industrial uses. We 
anticipate no problem as these results are gathered in textbooks in computer science and mathematics. 
This library and future work will be included into NASA Langley PVS librar^Q as soon as it becomes 
stable. 

The main contribution of this work is that we selected theorems that produce significant results 
for extremely low probabilities of failure of systems that run for a long time and that are amenable to 
formal methods. During our work, we discarded many mathematical methods that would need too many 
operations or that would be too technical to be implemented with existing formal tools. 

Notice that this work can be applied to any sequence of independent and symmetric random variables 
that satisfy equation (f2]). It is worth pointing out one more time that violating our assumption (indepen- 
dence of errors) would lead to worse results, so one should treat the limit we have deduced with caution, 
should this assumption not be met. 
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